MUC-5 evaluation metrics
نویسندگان
چکیده
The MUC-5 Scoring System is evaluation software that aligns and scores the templates produced by th e information extraction systems under evaluation in comparison to an "answer key" created by humans . The Scoring System produces comprehensive summary reports showing the overall scores for the templates in the test set ; these may be supplemented by detailed score reports showing scores for each template individually. Figure 1 shows a sample summary score report in the joint ventures task domain for the error metrics ; Figure 2 shows a corresponding summary score report for the recall-precision metrics .
منابع مشابه
MUC-4 evaluation metrics
The MUC-4 evaluation metrics measure the performance of the message understanding systems . This paper describes the scoring algorithms used to arrive at the metrics as well as the improvements that were made to th e MUC-3 methods . MUC-4 evaluation metrics were stricter than those used in MUC-3. Given the differences in scoring between MUC-3 and MUC-4, the MUC-4 systems' scores represent a lar...
متن کاملMUC-3 evaluation metrics
The MUC-3 evaluation metrics are measures of performance for the MUC3 template fill task. Obtaining summary measures of performance necessitates the los s of information about many details of performance . The utility of summary measures for comparison of performance over time and across systems should outweigh thi s loss of detail . The template fill task is complex because of the varying natu...
متن کاملSurvey Of The Message Understanding Conferences
In this paper, the Message Understanding Conferences are reviewed, and the natural language system evaluation that is underway in preparation for the next conference is described. The role of the conferences in the evaluation of information extraction systems is assessed in terms of the purposes of three broad classes of evaluation: progress, adequacy, and diagnostic. The conferences have measu...
متن کاملCritical Reflections on Evaluation Practices in Coreference Resolution
In this paper we revisit the task of quantitative evaluation of coreference resolution systems. We review the most commonly used metrics (MUC, B, CEAF and BLANC) on the basis of their evaluation of coreference resolution in five texts from the OntoNotes corpus. We examine both the correlation between the metrics and the degree to which our human judgement of coreference resolution agrees with t...
متن کاملInstance Sampling for Multilingual Coreference Resolution
In this paper we investigate the effect of downsampling negative training instances on a multilingual memory-based coreference resolution approach. We report results on the SemEval-2010 task 1 data sets for six different languages (Catalan, Dutch, English, German, Italian and Spanish) and for four evaluation metrics (MUC, B, CEAF, BLANC). Our experiments show that downsampling negative training...
متن کامل